Exploring how deep neural networks form phonemic categories

نویسندگان

Tasha Nagamine

Michael L. Seltzer

Nima Mesgarani

چکیده

Deep neural networks (DNNs) have become the dominant technique for acoustic-phonetic modeling due to their markedly improved performance over other models. Despite this, little is understood about the computation they implement in creating phonemic categories from highly variable acoustic signals. In this paper, we analyzed a DNN trained for phoneme recognition and characterized its representational properties, both at the single node and population level in each layer. At the single node level, we found strong selectivity to distinct phonetic features in all layers. Node selectivity to specific manners and places of articulation appeared from the first hidden layer and became more explicit in deeper layers. Furthermore, we found that nodes with similar phonetic feature selectivity were differentially activated to different exemplars of these features. Thus, each node becomes tuned to a particular acoustic manifestation of the same feature, providing an effective representational basis for the formation of invariant phonemic categories. This study reveals that phonetic features organize the activations in different layers of a DNN, a result that mirrors the recent findings of feature encoding in the human auditory system. These insights may provide better understanding of the limitations of current models, leading to new strategies to improve their performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analyzing distributional learning of phonemic categories in unsupervised deep neural networks

Infants' speech perception adapts to the phonemic categories of their native language, a process assumed to be driven by the distributional properties of speech. This study investigates whether deep neural networks (DNNs), the current state-of-the-art in distributional feature learning, are capable of learning phoneme-like representations of speech in an unsupervised manner. We trained DNNs wit...

متن کامل

On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models

Deep neural networks (DNNs) are widely utilized for acoustic modeling in speech recognition systems. Through training, DNNs used for phoneme recognition nonlinearly transform the time-frequency representation of a speech signal into a sequence of invariant phonemic categories. However, little is known about how this nonlinear mapping is performed and what its implications are for the classifica...

متن کامل

Inferring Phonemic Classes from CNN Activation Maps Using Clustering Techniques

Today’s state-of-art in speech recognition involves deep neural networks (DNN). These last years, a certain research effort has been invested in characterizing the feature representations learned by DNNs. In this paper, we focus on convolutional neural networks (CNN) trained for phoneme recognition in French. We report clustering experiments performed on activation maps extracted from the diffe...

متن کامل

Exploring the Design Space of Deep Convolutional Neural Networks at Large Scale

متن کامل

Editorial: Neural Mechanisms of Perceptual Categorization as Precursors to Speech Perception

This research topic describes recent advances in understanding the brain functional organization for sensory categorization along with its implications for speech perception. Among the 14 papers, one theme is how neural representations of auditory and visual input are transformed across different scales of neural organization to enable speech perception, and another is the neural mechanisms of ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Exploring how deep neural networks form phonemic categories

نویسندگان

چکیده

منابع مشابه

Analyzing distributional learning of phonemic categories in unsupervised deep neural networks

On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models

Inferring Phonemic Classes from CNN Activation Maps Using Clustering Techniques

Exploring the Design Space of Deep Convolutional Neural Networks at Large Scale

Editorial: Neural Mechanisms of Perceptual Categorization as Precursors to Speech Perception

عنوان ژورنال:

اشتراک گذاری